Technotools

home *** CD-ROM | disk | FTP | other *** search

/ Technotools / Technotools (Chestnut CD-ROM)(1993).ISO / misc_pto / b-trees / btrees.txt

Wrap

Text File | 1990-01-08 | 9KB | 186 lines

B-Trees SCSC 420 - File Management University of South Carolina J. E. Clary The concept of B-Trees grew from a competition by leading computer manufacturers and independent research groups in the late 1960's. This competition was to develop general methods for storing and retrieving data in large file systems that would provide rapid access to the data with a minimal overhead cost. Among the competitors were R. Bayer and E. McCreight, who were employees of the Boeing Corporation at the time. In 1972, Bayer and McCreight published an article entitled "Organization and Maintenance of Large Ordered Indexes" which first exposed the world to the conception of B- Tree organization. The need for a new method to access data arose from the long seek times associated with ordinary access methods and the need to have a sorted index in order to perform binary searches. The overhead of maintaining a sorted index was deemed unacceptable so a new method was needed. One explored solution was to use binary tree. This method would alleviate the need to resort the index with each new entry or deletion, but it too had some inherent problems. Because the binary tree is located on secondary storage, the number of seeks necessary to satisfy a search can be very high. A successful strategy for this is to reduce the seeks by paging groups of nodes in and out of memory. Another difficulty with binary trees is that they are constructed starting with the root and working down through the nodes. After construction, it is possible that the tree root could contain the wrong keys, making it necessary to find ways to repair the problem with rearrangement algorithms. Bayer and McCreight recognized that the decision to construct the tree from the root down, was in itself the problem. Rather than find a way to undo the problems inherent to binary trees, they concentrated on algorithms to construct the tree in a useful form to start with. With B-Trees the root node is allowed to emerge, rather than setting it up, and then perhaps having to change it. Each node of a B-Tree consists of an ordered sequence of entries and some pointers. The number of pointers is always equal to the number of entries + 1. The maximum number of entries that can be stored in a node is called the order of the B-Tree. When a node becomes full, it is split into two nodes and the entries evenly distributed between the old node and the new one. Since we now have two nodes, there is a need for a higher level in the tree to enable us to choose between the two nodes when searching. This new node, in essence will become our new root node. When a new root node is created, it needs to contain an entry also. The method to accomplish this is to move the entry that separates the two lower level nodes into the root. This is called promoting the entry. By using these methods to construct a tree, we are assured that the tree is always in balance, with regard to height. The path length from the root to any leaf is the same as the path length from the root to any other leaf. Another positive aspect of this type construction is that the entries that are promoted, are the entries that we want to be in the root node. By working up from the leaf level, splitting and promoting as nodes fill up, we eliminate the problems encountered with ordinary binary trees. Because of the properties of a B-Tree, many of the operations on the tree can be done recursively. The simplest of these operations is searching. The search through a B-Tree is divided into two distinctive parts. First the node where the entry should reside is located. Then the node itself is searched for the particular entry. The next operation to be considered is insertion of new data into the B-Tree. First, the Insertion algorithm must locate the node in which the new data item is to be placed. Second, the algorithm searches for the possibility that the data to be inserted is already contained in the tree. If the data is not already in the tree, the correct insertion point is located. At this point the insertion is made. There is always the possibility that the node where the data is to be inserted may be full. If this is the case, the insertion algorithm must be able to control the promotion and splitting of the node. Depending on the contents of the B-Tree, this promotion and splitting may filter up through the higher nodes and may even cause a new root node to be created. The process of splitting and promoting used in the insertion algorithm ensures that the rules of a B-Tree be maintained. This is also the case for the deletion algorithm. The removal of an entry must occur at the leaf level. When the desired entry is located in a leaf, simply delete that entry. If the entry is not in a leaf, we swap the entry with its immediate successor and delete it then. At times, the deletion of an entry will cause the B- Tree to experience underflow. Underflow is the condition where the number of entries in a leaf is less than the minimum number of entries allowed by that order B-Tree. When underflow occurs, the first thing to do is to look at the left sibling. If it has more than the minimum number of entries allowed, redistribute the both nodes entries so that they each contain at least the minimum entries allowed. If neither sibling has more than the minimum, concatenate the two leaves and along with their median entry from the parent node into one leaf. When concatenation occurs, you must check the parent to make sure it does not underflow. Another condition that can occur during deletion is when the root experiences undeflow. When this happens the remaining entry in the root is absorbed into a new root. This in turn causes the height of the B-Tree to be decreased. These are just the essentials of B-Tree implementation. Over the years since Bayer and McCreight, the concept of the B-Tree has evolved into a more efficient entities. One derivative, B*-Trees uses enhanced splitting and redistribution processes so as to increase the storage utilization. Others go a step further and create Virtual B-Trees. Reguardless of the implementation, the B-Tree as proposed by Bayer and McCreight has become the de facto standard for indexes into a data base system. Sources: File and Database Techniques, James Bradley, University of Calgary File Design & Programming, W. Wesley Peterson and Art Lew, University of Hawaii at Manoa File Management Techniques, Billy G. Claybrook, The MITRE Corporation File Structures, Michael J. Folik and Bill Zoellick, Oklahoma State University Pascal plus Data Structures, Nell Dale, University of Texas as Austin Understanding Data Base Management Systems, Joseph A. Vasta, Bethlehem Steel Corporation